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Abstract. The Wako-Saito-Munoz-Eaton (WSME) model, initially introduced in the 
theory of protein folding, has also been used in modeling the RNA folding and some 
epitaxial phenomena. The advantage of this model is that it admits exact solution in 
the general inhomogeneous case (Bruscolini and Pelizzola, 2002) which facilitates the 
study of realistic systems. However, a shortcoming of the model is that it accounts 
only for interactions within continuous stretches of native bonds or atomic chains while 
neglecting interstretch (interchain) interactions. But due to the biopolymer (atomic 
chain) flexibility, the monomers (atoms) separated by several non-native bonds along 
the sequence can become closely spaced. This produces their strong interaction. The 
inclusion of non-WSME interactions into the model makes the model more realistic and 
improves its performance. In this study we add arbitrary interactions of finite range 
and solve the new model by means of the transfer matrix technique. We can therefore 
exactly account for the interactions which in proteomics are classified as medium- and 
moderately long-range ones. 



PACS numbers: 05.50. +q, 87.15.hm, 68.43.De 
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1. Introduction 

The WSME model is a generalization of the one dimensional (ID) lattice gas model with 
nearest neighbor (NN) interatomic pair interactions. In addition to the NN interactions, 
cluster interactions are present inside continuous chains of adjasent atoms. The model 
was initially introduced by Wako and Saito [U |2] and by Munoz and Eaton [31 H] to 
understand protein folding. The role of atoms played the peptide bonds. Recently this 
model was used to describe RNA folding [5]. Furthermore, a similar model was derived 
in a theory of strained epitaxy [61 [7] . 

The physical meaning of the cluster interactions is easily understandable in the case 
of coherent strained epitaxy. Let us assume that besides the attractive NN interaction 
Vi < the interatomic potential has a rigid core which does not let the atoms approach 
each other closer than the core diameter d [8j. So if the diameter is larger than the 
substrate lattice spacing a, the adatoms within an atomic chain will be displaced from 
the centers of the deposition sites by Uj oc /. The requirement of coherence means that 
the displacements should be small in order for the displaced atoms remained within the 
same lattice cell. In general this condition will be violated for sufficiently long chains 
but in the present study we consider only finite systems and assume that the misfit is 
sufficiently small for the condition of coherence to be satisfied. In this case the misfit 
energy of atom j in the harmonic approximation can be estimated as /ruj/2, where k is 
the curvature of the substrate potential near its minimum. The atomic displacements 
within a chain of length I can be found from symmetry considerations as 

J/C/ + 1/2) j = 0,l,...,//2-l /even 
U±J 1 fj j = 0,l,...,(/-l)/2 /odd ' [L) 



With the use of identities 



and 



j = m(m + l)/2 



=i 



= m(m+ l)(2m+ l)/6 

the total energy of the chain of length / after some algebra can be calculated as 

E (l) = y(i) l + Vi q _ ;Q + (kf/2A)l(l 2 - 1), (2) 

where is the adsorption energy per atom. 

Let us assume that N a < N adatoms are gathered onto N a /l equal chains of length 
/. The total energy of the system in this case will be equal to (N a /l)E"\ Thus, the 
energy minimum at fixed N a will coincide with the minimum of E®/1. From (j2J) it is 
easy to see that such a minimum always exists provided / =fi 0. This produces a simple 
model of self-assembly of size calibrated coherent nanostructures similar to quantum 
dots. 
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Formally the hamiltonian (or, more precisely, the configuration dependent free 
energy) of the WSME model is[HI21l31HlEllZllS] 

N N i 

<s ) me=ee^ ) n »*' ( 3 ) 

1=1 i=l k=i—l+l 

where in the case of epitaxy N is the total number of deposition sites, rij = 0,1 describes 
the occupation of site j by the gas atom, are inhomogeneous (i. e., site-dependent) 
interactions within the continuous atomic chains of length I ending at site i, as can be 
seen from the product in (j3J). As was shown in Equation (5) of [TU], if chain energies 

for all I are known, the values of in ([3]) can be found as the discrete second 
derivative of with respect to /. Furthermore, to simplify notation we assume the 
chemical potential /i to be included (with the minus sign) into the parameters 

In the case of biopolymers the interpretation of the interactions in the model is 
different. Firstly, the "atoms" in this case are either the amino-acid residues [H [2] or 
the covalent bonds [31 HI [9] which are assumed to be present in two states: the native and 
the non-native one corresponding to the values 1 and of a binary variable, respectively 
[H [21 [3j HI [5] . Thus, N can be either the number of peptide bonds connecting N + 1 
amino-acid residues or the number of the residues themselves. In the more developed 
bond model is the loss of conformation entropy by the bond in the process of 
formation of the native state 0,111 [9]. Cluster interactions V® in ([3]) can be presented 
in the form [9] 

Vf = eiAji, (4) 

where i and j = i — I + 1 are the ID coordinates of two peptide bonds; Aji = 1 
if the bonds are in contact with each other and is equal to zero otherwise; is the 
inter- residue interaction energy between residues i and j + 1. Farther details are given 
in [9\. 

The binary matrix Aji defines the contact map of a protein in its native state. It 
depends on the definition of the residue contact. The major parameter here is the cutoff 
distance between atoms which separates the atoms considered to be in contact from 
remote atoms. For example, if the distance is chosen to be 8 A then each residue on 
average contacts with approximately ten other residues (see Table 1 in [H]). For smaller 
values of the cutoff chosen in [2] the average number of contacts per residue is < 3 (see 
Table I in [2J). 

The qualitative similarity between the epitaxial and the folding models can be seen 
on the lattice protein folding model considered in [12]. According to the model, the 
folding starts at random places in the process of nucleation of a local native structure. 
The binary bond variables inside the regions are all equal to unity while in other regions 
the variables are zero. Statistically such behavior can be described by the WSME model. 

Despite being ID, the WSME model has two peculiarities hampering its exact 
solution. The first is the absence of the translational invariance which makes inapplicable 
the efficient techniques of the homogeneous case [H [21 El [10]. The other peculiarity is 
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that the hamiltonian ([3]) contains long-range interactions so the conventional transfer 
matrix (TM) method cannot be used. Because of these peculiarities, the exact solution 
for the WSME model at equilibrium in the inhomogeneous case was found only recently 
[9]. This solution, on the one hand, greatly facilitates the study of the kinetics of 
folding [13], on the other hand, it allows for the modeling of the strained epitaxy 
in inhomogeneous environments, such as alloyed substrates [7]. This latter case is 
of interest in connection with engineering applications where the ID nanostructures 
(such as nanowires, nanomagnets, nanotubes, etc.) may have important applications 
[HI HSl [161 HZl HH [19l [20I [2U [22I [23I [Ml Eg. Because in the device environment the 
wire (for example) may traverse different chemical surroundings, make turns, experience 
disordered substrate potential due to doping, etc., the interaction parameters describing 
the model should in general be position-dependent. 

In the epitaxial systems, however, the inhomogeneous WSME model describes only 
ID chains of atoms or molecules. But in practical applications more than monatomic 
structures can be needed in order, e. g., to enhance the conductivity of a nanowire or to 
increase the magnetic moment of a nanomagnet. These structures may consist of several 
adjacent atomic rows on a terrace of a vicinal surface or on the surface of a nanotube. 
Such quasi-2D structures can be described with the use of the WSME model only if at 
least further neighbor pair interactions are added to the hamiltonian (]3]). 

Indeed, let us consider the topology of the deposition sites shown on figure [U This 
topology may correspond to the deposition sites on the terrace of a vicinal surface with 
a rectangular geometry. In this case atoms 2 and 6 or 7 and 11 will be nearest neighbors 
on the substrate lattice but not along the ID lattice. But if the substrate has the 
geometry of triangular lattice with the angle 2-1-5 being equal to 120°, the atoms 1-6 
and 7-12 (for example) will constitute additional nearest neighbor pairs. If, farther, the 
sites in figured] are rolled into a cylinder with the chiral vector (4,0), the pairs of sites 
of type 1 and 4 or 9 and 12 become nearest neighbors too. Other tube topologies will 
bring together other atoms. An example of the nanotube with chirality (4,1) will be 
given in section HI Obviously in all these cases the neglect of the interactions between 
the atoms on the nearest neighbor sites of the substrate lattice will qualitatively change 
the physics of the system under consideration. But such interactions in the ID lattice 
coordinates will be further neighbor interactions (4-th neighbors in the above case of 
the tube of rectangular geometry) which do not enter into the WSME hamiltonian (|3j). 

Similar arguments can be applied also to 2- and 3D lattice models of proteins 
[26] [12] . Munoz and Eaton noted in this connection [3] that the inclusion of interactions 
between the bonds belonging to different native stretches into the WSME model not only 
will improve quantitative description of the kinetics but also "would add considerable 
flexibility to possible structural mechanisms by producing additional routes between the 
denatured and native states" . 

The non-WSME interactions may be of qualitative importance in differentiating 
between the proteins and the RNA folding. While in some respects being similar, the 
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Figure 1. Black points represent the deposition sites on the substrate; solid line 
with numbers shows a possible mapping of the sites on a ID lattice with integer 
coordinates. Dashed lines connect nearest neighbor sites on the substrate lattice with 
rectangular geometry. These sites are the nearest neighbors on the substrate but the 
fourth neighbors in the ID lattice coordinates. For farther information see the text. 

folding of the two polymer types differ [27]. In particular, in contrast to proteins, the 
RNA native state is hierarchical in that the secondary structure is energetically well 
separated from the tertiary one and can be considered as a collection of base pairings 
[28j [29j [30] . Because of this, in the standard model of the RNA the energetics is governed 
by the medium and long range pair interactions [291 [31] . 

Last but not least, even when the larger distance along the ID lattice corresponds 
to larger separation in real space so that further neighbor interactions are small, in some 
cases they also may cause qualitative phenomena. For example, if the model considered 
at the beginning of this section is augmented by the substrate mediated repulsive dipole 
pair interactions J32j [33] or, alternatively, by attractive interaction of some other origin 
[10] the model, in addition to the self-assembly and size calibration, will simulate the 
important phenomenon of self-organization of quantum dots into periodic arrays. Thus, 
there exist many situations when farther neighbor interactions are among the most 
important ones in the system and cannot be neglected without qualitatively changing 
the system properties. 

But besides the pair interactions, sufficiently strong cluster interactions of non- 
WSME kind are also usually present in epitaxial systems (see the next section). In this 
study we present therefore a TM solution of the WSME model augmented by arbitrary 
short range interactions. Because of the restrictions imposed by the TM technique, 
in practice the method will be restricted to the interactions of relatively short range. 
Nonetheless, fairly large radii up to those which in proteomics are classified as the 
medium- and moderately long-range interactions are feasible to exact treatment within 
our approach. 

In the next section we introduce our extension of the WSME model, in section [3] 
explain its formal solution and in section H] present a simple illustrative calculation. In 
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the concluding section Owe discuss our results. 
2. Extended WSME model 

The pair interactions, such as the Coulomb or the van der Waals are the most ubiquitous 
in nature and their account should be the primary goal in extending the WSME model. 
But more complex cluster interactions (CIs) can also be important, especially in metallic 
systems where the pair approximation holds only approximately. For example, ab initio 
calculations show that interactions within atomic trios deposited at the surface may 
have the same magnitude as the nearest neighbor pair interactions, i. e., to be among 
the strongest in the system [3U EE]- To account for all possible CIs, in the ab initio 
theories of alloys and epitaxial systems the method of interatomic interaction expansion 
over a complete set of CIs has been developed [36l [37], [34]. It is pertinent to note that 
binary alloy is formally equivalent to the lattice gas model. In order for our approach 
to be compatible with this powerful technique, we developed it for an arbitrary set of 
CIs restricted only by the maximum values of their interaction radii. This is necessary 
for the computational tractability of the TM equations. 

Let us first consider the most general hamiltonian which includes all possible CIs 
in the system of size N 



where the subscript N — 1 denotes the maximum interaction radius. We define the 
radius of a CI as r = i max — i m i n , where i m ax{min) is the largest (smallest) index of 7ij 
in the cluster, q = 0, 1 and, by definition, nf = 1. The CIs in (jHJ) are characterized by 
sequences of binary digits which can be gathered into the number 



where the bar over a number denotes that its binary representation is meant; the 
subscript B denotes that the term within parentheses is the binary representation, not 
the product and Wc is the strength of the corresponding CI. In (jSJ), (EJ) and below we list 
the terms in the products in reverse order because in our TM approach it is convenient 
to number the sites from right to left. 

The total number of CIs in hamiltonian (JSJ) is of 0(2^) which is a huge number for 
even modest systems of sizes iV m 50 characteristic for the smallest proteins. Only a 
small part of 0(iV 2 ) of the CIs from (jSJ) enters into the hamiltonian (EJ). Obviously, in the 
general inhomogeneous case it would be impossible to take into account all interactions 
for a system of practical interest. To make the problem manageable, we restrict the 
extent of the interactions by some maximum radius R. 

The extended WSME model we will solve in the next section has the hamiltonian 




(5) 



C = (Cjv-l • • • CiC )b, 



(6) 




(7) 
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where the second term on the right hand side is defined as the sum of all such terms in 
(jSJ) that do not contain interactions of radii exceeding R and, besides, in order to avoid 
double counting these terms should not enter into (J3]). 

3. Recursive transfer matrix solution 

In physical terms the main difference between the models represented by hamiltonians 
fl3]) and (J7J) is as follows. In fl3]) due to the specific form of interactions the energy of any 
configuration is the sum of energies of the continuous atomic chains (or the stretches of 
the peptide bonds) it contains. The interactions in ([3]) become zero as long as atomic 
clusters are separated by a single empty site. Thus, in the homogeneous case the system 
can be considered as a mixture of non-interacting molecules of N kinds (different sizes) 
[10] . Presumably because of this simplicity the homogeneous case was solved much 
earlier than the general case [U [2] . 

The additional term in (J7|) changes drastically the situation even in the 
homogeneous case because now not only different chains interact but their interactions 
are quite nontrivial. For example, in the case of R — 14 considered in [35], up to eight 
islands may be interacting via appropriate CIs. Because of this, there seems to be no 
way of accounting for all possible situations except through their direct enumeration. 
In the case of finite range interactions and in ID this can be done recursively by adding 
sites to the system one by one. 

So let us for the time being neglect in ([3]) all interactions whose radii exceed R. 
Because of the finite interaction range, when adding a site to the system consisting of 
K > R sites only the interactions with the last R sites need be taken into account. 
The accounting can be done with the use of the vector partition function Z^ K > whose 
components are the partial traces over all except the last R sites 

Zi% K _ u ..., nK _ R+1 = Tr ni ,n 2 ,...,n K - R exp(-H^). (8) 

Here is a hamiltonian (JJJ) for a K-site system which contains only interactions 

within the range not exceeding R. The total partition function is found from (jSJ) as 

Z^=^_zf\ (9) 

<5=0 

where the bar over the number has the same meaning as in (J6]). 

As was shown in Appendix of [35] and will be explained in more detail below, a 
recurrence relation for Z^ N ' in the number of sites in the system N can be established. 
This technique is an extension of the methods developed in connection with the 2D 
Ising model in [38, 39J. Its advantage is that it deals with sparse TMs which provide 
considerable gain in computational effort in the case of large R. 

If we assume that the vector partition function for the system of size N — 1 is known 
then the partition function for the size N can be calculated recursively with the use of 
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the sparse TM as 
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(10) 



where the column vectors correspond to Z^ N ^ N ~^\ the empty and filled circles describe 
the empty (rij = 0) or filled (rij = 1) sites in the subscripts of the partial partition 
functions in flH]) and the subscript N of the TM is the site index for all b^ entering the 
matrix. We note that we use the same symbol N for the system size and for the recurrent 
relation to stress that at every iteration we obtain the (vector) partition function of a 
system corresponding to some size N. 

In the case of finite-range interactions the structure of TM in (TlOl) is easily 
understood. Having added site N to the system consisting of N — 1 sites we first 
have to account for the interaction of this site with the rest of the system and then take 
the trace over the (N — -R)-th site because with the radius of interactions being R all 
interactions of this site with the rest of the system have already been taken into account. 
Taking the trace amounts to adding with appropriate weights two Z^" 1 ) differing by 
the filling of site N — R. In the case of the empty site N the weights are equal to unity 
because the empty site does not interact with anything. These terms occupy the upper 
half of the TM ( flOj) . The lower half of the matrix contains the terms corresponding to 
the interaction of the occupied site iV with the rest of the system. The term 

baN = exp(-AEa N /kBT) (11) 

is the Boltzmann weight corresponding to the interaction of the atom at site iV with 
the configuration of atoms corresponding to Z$ ; AE&n in ( TTTj) is the energy of 
interaction of the atom with configuration a. 

Now, what have to be changed in order to include the arbitrary range interactions of 
the WSME type into the recursion scheme ( TlOl) ? It turns out that only the last equation 
need be modified. This is because the hamiltonian ([3]) contains only the interactions 
inside continuous chains. But all components of the state vector except the last 

one contain at least one empty site among the last R sites. Therefore, the extent of 
the chain interactions is restricted by the distance to the nearest empty site and thus is 
smaller than R. 

In the last component, however, all sites are filled. So when adding an additional 
iV-th site filled with an atom we do not know which interactions of the WSME type 
should be taken into account as the last R atom may belong to a chain of any length — 
from R to N — 1. We overcome this difficulty in a straightforward manner by simply 
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taking into account all the possibilities. Namely, we replace the last two-term equation 
in the set ( 110]) with the sum over all configurations where the last R sites belong to 
a chain of length greater or equal to R. This can be achieved with the use of the 
component Z^ =••...• mo with all sites except the first one being filled. Note that 
the positions are counted from right to left. When adding chains of different lengths to 
this component we can control the total chain length and thus know which interactions 
from 03]) should be taken into account. 

Formally this is done as follows. In the course of the recursive solution we keep the 
array of components Z^ for M = R — l, . . . ,N — 1 (the explanation of the term R — l 

is given below) ; in another array we gather the chain energies E$ . These account for all 
interatomic interactions entering (j^) inside the chains of length I ending at site N. The 
chains are assumed to be isolated so no interchain interactions enter E®. By attaching 
a chain of length N — M + R— 1 to the configuration corresponding to Z^ which 
amounts to multiplying the latter by the corresponding Boltzmann factor, we obtain a 
configuration with a continuous chain of atoms starting on site M — R + 1 and ending 
on site N. As is seen, the (R — l)-atom chain in Z^ ending at site M and the chain 
ending at N overlap at sites inside the (sub)chain of length R—l. In the equation below 
this double counting is taken care of by the division by the necessary Boltzmann factor 
corresponding to the chain of length (R— 1): 

N-l 



2 R -1 

M=R-1 



J2 eM-E^ M+R -Vk B T]z!™L, (12) 



where 



Z^Z^/eM-E^/ksT}. (13) 

The meaning of ffl2l is simple: the component of Z^ with the last R sites being filled is 
obtained as the sum of all possible configurations having the chains of length R < I < N 
as their end sites. As is easy to see, the factor Z-^P— is sufficient for accounting for all 
short-range interactions of the chain of length N — M + R — l with the rest of the system 
because the atoms at sites M + 1 and larger cannot reach the atoms beyond M — R 
due to the finite interaction range. The only remaining problem is connected with the 
longest chain of length iV which should comprise the whole system because Z R starts 
with an empty site. This difficulty is overcome by initializing the recurrence ([IU]) with 
2>{ R ~ X > , i. e., with the system containing only R—l sites instead of R. The fillings of 
these sites correspond to the last R — l sites in the vectors in fflO]) . i. e., the rightmost 

fry i\ 

column in these vectors should be crossed out so the component Z has all its sites 
filled. The components corresponding to the empty crossed out sites are calculated as 
the conventional Boltzmann factors while in the cases when the omitted site was filled 
they are all set to zero. The validity of this initialization of the recurrence (TlQ]) can 
be proven either by a straightforward calculation of Z^ via one iteration step and 
comparing it with calculated straightforwardly, or by associating with the crossed 
out column a fictitious 0-th site which has an infinite on-site energy. Thus, on the one 
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hand, the initial Z^ R ~^ corresponds to the system of size R (0, . . . , R — 1); on the other 
hand, the components corresponding to this site being filled are all zero, as suggested 
above. 

4. Illustrative calculation 

As can be seen from ffl2|) . formally our algorithm is quadratic in the system size N. 
This means that for sufficiently large systems the calculations may become prohibitively 
difficult to perform. The sizes of biopolymers met in nature, however, are restricted [40J. 
Because from practical point of view of major interest are natural biological molecules, 
we will restrict our discussion to this case. A typical protein consists from about 500 
amino acid residues |31J . So in order to assess numerical performance of our algorithm 
we consider for simplicity an epitaxial model of this size. The epitaxial systems of 
similar sizes are of interest also for the nanoengineering. Because the devices of sizes 
in tens of nanometers (hundreds of atoms) are efficiently modeled in the framework of 
continuum approximations [4T1 l4"2l 43J . our approach may be useful in studies of smaller 
few-nanometer structures [14J. Thus, the length in 500 atomic diameters (about 100 
nm) is, presumably, an upper limit of interest for the atomic simulations of epitaxial 
systems (the issue of their width will be discussed below). 

We consider coherent strained epitaxy on the surface of a finite size screw (4,1) 
nanotube with rectangular substrate lattice geometry (see figure []]) with homogeneous 
interactions and consisting of 500 deposition sites. This geometry was chosen because 
the diameter four would correspond, inter alia, to a model of the a-helix similar to that 
considered in [H] but with additional pair interactions. This may be used to model 
the helices with different interbond interactions to better describe their properties. The 
(4,1) topology means that sites 2 and 6 or 7 and 11 in figure [U are nearest neighbors 
along the direction parallel to the tube axis while the sites along the solid line are all 
equivalent. For example, the interaction between atoms at sites 2 and 3 is the same as 
between those at sites 8 and 9 because all points along the line belong to a helix. The 
potential of the substrate (the tube surface) is periodically corrugated along the helix 
and will be treated in the harmonic approximation (T5]), as discussed in the Introduction. 

Besides the positive misfit energy, the atoms in our model experience, apart from 
the NN interaction v±, small attraction between the first and the second neighbors 
along the helix. Because of the homogeneity of the model, the nomenclature of (jSJ) is 
superfluous, so below we denote this interaction as t>2. Besides, v± will designate the 
repulsion between the atoms which are the fourth neighbors along the helix but are 
NN on the substrate surface (see figured]). This model qualitatively describes the large 
misfit systems studied in [25] and |24j . In these papers it was found that while on 
the tubes of large diameters the interaction between the nearest neighbor adatoms is 
repulsive along both directions, on those of small diameters the interaction along the 
high curvature direction became attractive due to the increased interatomic distance. 
But the interaction in the direction of small curvature along the tube axis remains 
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repulsive even for the small-diameter tubes. 

In the explicit calculations below we used the following values in units of the NN 
attractive interaction v% < 0: 

u 2 = 0.3ui, u 4 = 0.2|v a | and kf 2 « 6.8 • 10 _3 |^i|. (14) 

The energy of an isolated chain of length I can be obtained from (j2j) by adding to it the 
terms due to v 2 and v 4 

£(0 = £ (/ - *) >0 t;,, + (kf 2 /24)l(l 2 - 1) - A**, (15) 

i=l,2,4 

where the subscript > means that only positive values of (/ — i) contribute and V' 1 ) 
in (j2J) was set equal to — //. 

Numerical values of the pair interactions (TT4~|) were chosen in such a way that in 
the absence of misfit (/ = 0) the reduced energy E^/l did not have a global minimum 
at finite value of I so that the system were of phase separation type. This means that 
the atoms at low temperature tend to gather into one cluster. In the presence of the 
misfit, however, E^/l has a local minimum at I = 12. This choice means that the chain 
makes three turns around the tube which approximately corresponds to the structure of 
the typical a-helix. This model was solved with the recursive technique of the previous 
section at three different temperatures for the system consisting of 500 sites at half 
coverage (250 atoms). In figure [2] are shown the size distributions of chains of different 
lengths on the surface of a cylinder under consideration. As is seen, at the highest 
temperature the size distribution is similar to the random distribution of atoms while 
at the lowest temperature it exhibits very good size calibration with > 96% of atoms 
belonging to chains of lengths 11, 12 or 13. Thus, in the presence of the misfit the atoms 
gather into chains of about 12 atoms each. This result is in accord with the theory [15] 
but it was not obvious from the start because the interisland interactions are known to 
shift the calibrated size from its noninteracting value at the minimum of E^ l > /I. 

All calculations were performed on a modern Intel® processor with the use of 
Python scripts. This choice was motivated by the problem of numerical underflow or 
overflow which appeared in the calculations. The problem is rooted in the exponential 
scaling of the partition function with the system size: ~ exp(iV0), where is the 

reduced (per site) grand potential. Because of this, at sufficiently large N Z^ N ' may 
acquire arbitrarily large or small values which exhaust any fixed numerical range. In 
Python libraries, however, there exists the module decimal which allows for calculations 
with extremely small and/or extremely large numbers. 

When present, this problem severely hampers the calculations. For example, in 
the above model with N = 500 and the parameters shown in figure [2] calculation of 
2f(500) required only a fraction of a second because the conventional double precision 
arithmetic was sufficient: The maximum values of were of O(10 101 ). But this 

means that the system with, e. g., 5000 sites will have Z^ 5000 ) ~ 0(1010) which in our 
approach would require the use of the module decimal. Indeed, explicit calculation 
gave Z^ 5000 ) w 1.5 • 10 1011 . This calculation took almost 35 minutes which is about 
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Figure 2. Low temperature size calibration of self- assembled atomic clusters in the 
model described in the text. The number of clusters of different sizes at different 
temperatures: — T = \vi\, H — T = 0.1|vi| and O — T = 0.01|«i|. The curves 
are guides to the eye except the monotonous dashed curve which describes random 
coverage. 



150 times longer than the calculation at this size without the module. Repeated with 
the parameters where the double precision was sufficient it took only about 14 seconds. 
Thus, the software realization of high-precision arithmetics costs more than two orders 
of magnitude in performance. If this is characteristic for all such software, much 
better choice is to use the quadruple precision realized in some C/C++ and Fortran 
compilers. In addition to much smaller overhead due to higher precision, the compiled 
languages offer additional speed up in about two orders of magnitude in comparison 
with the interpreted languages such as Python. In this way the calculations with large 
biopolymers should be very fast. For example, the calculation of z( 35000 ) with the Python 
script in the case when the double precision was sufficient took less than 12 minutes 
which with the compiled language should take only a few seconds. 

It should be reminded that all calculations were performed for R = 4. As can be 
seen from ([10]) and (fl2l . the equations have the form of scalar products of vectors of sizes 
2 R and N, respectively. This means that the calculations are trivially parallelizable, so 
the execution speed at large R will depend on the number of processors available for the 
calculation and on the performance of one processor. The latter can be assessed from 
the model calculation of Z^ 500 ^ with R = 20 or O(10 6 ) equations in the set (fTUl) with the 
use of a Python script which lasted about 15 minutes. This means that with a compiled 
language the time of the calculation will measure in seconds. We estimate that the 
calculations with R in the range < 40 should be feasible on modern supercomputers. 



Transfer matrix solution of the WSME model augmented by short range interactions 13 
5. Discussion 

In this paper we presented a transfer matrix solution of the WSME model extended 
to account for arbitrary short range interactions. The transfer matrix approach is, in 
principle, a universal technique capable of solving any lattice problem with short range 
interactions. In practice, however, it is restricted to relatively small interaction radii 
due to the exponential growth of the computational effort with R. This restriction does 
not allow the method to be considered as a universal tool for obtaining exact solutions 
in dimensions D>1 because in statistical mechanics one is usually interested in the 
thermodynamic limit which corresponds to R — » oo and thus is unaccessible to the TM 
technique in truly 2- or 3D systems [38j [391 135] . 

The natural biopolymers, however, though sometimes very large, are restricted in 
their maximum size, so their characteristic dimensions are also finite. According to 
current nomenclature the radii of short and medium range interactions in proteins do 
not exceed 20 residues [T1J. As we saw, this case causes no difficulty even for a single 
processor computer. On a supercomputer with tens to hundreds parallel processors 
even the moderately long-range interactions of the extent R < 40 studied in [T2] should 
cause no problems. Thus, in the case of proteins our approach potentially allows for the 
exact solution of the extended WSME model with arbitrary medium- and moderately 
long-range interactions. According to [H] (see their fig. 4), the interactions with ranges 
exceeding 40 constitute only about 5% of all interactions. Thus, the technique developed 
in the present paper allows to improve up to 95% of all interactions. 

The RNA molecules are less amenable to the study within our TM technique 
because the double-stranded nature [271 E] of the polymer makes the pair interactions 
very long-ranged, up to the total molecule length when the first and the A-th nucleotides 
pair. Therefore, the pair interactions in the ranges extending not farther than about 20 
pairs along the stem away from the hairpin loops can be treated within our approach. 
The pseudo knots formed by nearby hairpin heads are other potential candidates for 
the description with non-WSME interactions [29J. An alternative way of describing the 
nucleotide interactions is to include them into the WSME part [5]. 

In the case of epitaxial systems the maximum interaction radius R < 40 lattice 
units restricts the application of the method to the nanotubes of similar and even lesser 
circumference [35] . In this connection it is pertinent to note that R ~ 20 approximately 
corresponds to the upper limit of the tube size when the high curvature of small diameter 
nanotubes can qualitatively change the ordering of adsorbates consisting from large 
atoms [25 j . In the case of deposition on the terraces the upper limit of width (< 40 
atomic rows) even exceeds the width of the terraces (16 rows) used in [21] for epitaxial 
growth of magnetic nanostructures. Such a restriction of the accessible widths is not 
very serious from the practical point of view. In the case of wide nanowires in tens of 
atoms the atomic resolution is not very important because an error in a few atoms can 
be neglected in most cases. Nanostructures of such sizes can be efficiently simulated 
within continuum approximations |4"l j |4"2"1 14"3]. 
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Furthermore, the short range interactions can be used to describe the substrate 
propagated elastic dipole-dipole interaction in ID model of strained epitaxy proposed 
in [6] . The dipole-dipole interaction behaves as the inverse cube of the distance [32j [33] 
and so at R ~ 10 — 20 in ID systems can be neglected in most cases. 

It should be mentioned that the direct push interaction [8] leading to the 
interactions of the WSME type in ID or on the screw tubes is not operative in wires 
on the terraces of width greater than two (on the non-rectangular substrate the wires 
consisting of two rows may still contain such a contribution). In this case the origin 
of some of the WSME type interactions can be different. For example, the interaction 
corresponding to the largest cluster containing all atoms differentiates two cases: the 
fully filled terrace and the terrace with one vacancy. Such an interaction may account 
for the volume contribution to the vacancy formation enthalpy. Thus, there is enough 
interesting epitaxial systems (and we mentioned only a few of them) which can be 
simulated with the exact transfer matrix technique developed in the present paper. 
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